Genome Analysis Method

ABSTRACT

This invention makes it possible to perform analysis for estimating the characteristics of a population using sample data. By obtaining sample data, embedding genetic (statistical) knowledge in a first and second state variable that have duality, and having the first and second state variables converge to the original value, the characteristics of the population of the sample data are estimated, and the estimated results of the characteristics of the population are output. By doing so, it is possible to perform analysis for estimating characteristics of a population using sample data.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to a genome analysis method that performsanalysis for estimating the characteristics of a population using sampledata.

2. Description of the Related Art

All the living organisms existing on earth are made up of cells, and ineach individual cell, there are genomes that record gene information.Cells are divided into prokaryotic cells and eukaryotic cells accordingto differences in their cell structure. Genomes of prokaryotic cellssuch as bacteria or cyanobacteria exist not in a state of partition bymembrane; however, genomes of eukaryotic cells such as in animals andplants exist in a nucleus surrounded by a nuclear membrane.

In other words, a genome indicates a group of chromosomes that areessential for carrying out living activities. Also, the term genome is acompound word that is made from the words gene and chromosome.

Here, the basis of life is the cell, and that cell is surrounded by acell membrane, and the nucleus is surrounded by a nuclear membrane, suchthat the independence of each unit is maintained. Human cells comprisespecialized cells that can be categorized into nerve cells, musclecells, blood cells, immune-system cells, epidermal/epithelial cells,which are cells on the surface of the skin and tissue, sensory cells andthe like according to function and shape, and undifferentiated cells,called stem cells, which are the source of these cells. Cells have animportant aspect that changes with time. That is, cell division and themaking of new cells. Cell division is an important mechanism that makesit possible to transmit and express gene information of genes.

There is a chromosome inside the nucleus. That chromosome contains thegene information, and genes are arranged on that chromosome. The genescan also be said to define the method of making protein in a genome. Thebasic substance that makes up a chromosome is DNA (deoxyribonucleicacid), and the genetic information is preserved in the order of the fourbases A, T, G and C in the DNA. A haploid living organism, such as somespecies of bacteria or virus, has one genome.

A living organism that is a diploid, for example, a reproductive cellsuch as a human egg or sperm has one set of genomes comprising 23 typesof chromosomes. In a somatic cell, there are two groups of genomes (46types of chromosomes). The human genome comprises approximately 3billion DNA base pairs (3,000 mega base pairs, 1 mega base pairs equals1 million base pairs), and when arranged in one string has a length ofapproximately 1 meter.

A genome is a collection of all of the gene information existing in acell, and it contains information for controlling the genes and geneexpression. Here, protein and genes could be referred to as the productand design drawing, and in the genome, in addition to the designdrawing, there exists a part that manages and controls the production ofthe product. At the present time, the significance of that existence isunclear, however, there is a considerably large percentage of area thatis thought to have an effect on maintaining life functions. Byclarifying these, it will become possible to gain a more accurateunderstanding of the life process.

From this, a ‘human genome analysis project’ for analyzing all humangenome base sequences called human genomes, and a project of‘determining all genome base sequences’ are being studied for variouskinds of organisms, including humans. Also, by performing three-in-oneresearch of genes and protein, it will become possible to gain a highlevel of understanding of the life process.

In order for that, at first, it is considered that the network betweengenes must be understood. In other words, a plurality of proteins form anetwork, and those protein groups carry out a certain function.Therefore, by studying functions or corresponded information, it ispossible that a gene having an unknown function may be discovered.

Here, genome analysis is the overall analysis of genetic informationcontained in the genome of a living organism, and begins fromdetermining the base sequence (GATC sequence) of the DNA molecule of thegenome. However, it is not easy to determine the locations and what kindof genes from just the base sequence data. Therefore, analysis of thegene products such as messenger RNA and protein, which is made bytranscription and translation, and comparison such as how similar basesequences are between species, and furthermore, and analysis based ondata related to individual genes that were analyzed by experimentalbiology of Escherichia coli, budding yeast or the like.

Incidentally, in the case of humans, the sequence of 3 billion DNA basesthat are included in the total 46 chromosomes, 44 autosomal chromosomes,X chromosome and/or Y chromosome (or in other words DNA molecule) is thehuman genome. The genome information that we have is inherited from thegenome information of the parents of the previous generation. The genomeinformation of the parents is inherited from ancestors of even aprevious generation. By tracing the origin of genetic information ofeven previous generations in this way, it is possible to finally reachthe genome of the first living organism in 3.8 billion years ago.

As a method of performing genome analysis, patent document 1 discloses amethod of analyzing genome in which after genome sequence information isinput, determines whether or not there is a sequence section in which aplurality (for example 10 or more) of identical bases are arrangedcontinuously in the input genome sequence information, and when there isa plurality of identical bases, extracts base sequence information thatcomprises a specified number of bases that are continuously arranged atthe front and back of the sequence section in which the plurality ofidentical bases are arranged, and outputs the extracted base sequenceinformation.

With this kind of method of analyzing genome, it is possible to findpolymorphic markers for quickly and efficiently identifying candidategenes related to diseases with accuracy similar to SNPs without usingSNPs (single nucleotide polymorphisms).

However, the method disclosed in patent document 1 is a method ofanalyzing genome that finds polymorphic markers for identifyingcandidate genes related to diseases; however, in the analysis, it isnecessary to analyze the DNA base sequences for approximately 3 billionbase pairs from various viewpoints. Therefore, since it is estimatedthat there exists various method of analyzing genomes that are stillundiscovered, discovery of those methods is anticipated.

SUMMARY OF THE INVENTION

Taking into consideration the above conditions, the object of thepresent invention is to provide a method of analyzing genome that iscapable of estimating the characteristics of a population from sampledata.

[Patent Document 1]

Japanese Patent No. 2003-288346

The method of analyzing genome of this invention is a method ofanalyzing genome for performing analysis for estimating thecharacteristics of a population from sample data, and comprises: aprocess of obtaining the sample data; a process of estimating thecharacteristics of a population to which the sample data belongs byselecting a first and second state variable having duality according togenetic (statistical) knowledge, and making the first and second statevariables converge to the original value; and a process of outputtingthe results of the estimated characteristics of the population.

Also, the method of analyzing genome further comprises a process ofmutually performing transformation by transformation equations asoperators that are embedded with genetic (statistical) knowledge inwhich the first and second state variables are expressed by each other,and estimating the first and second state variables by a third statevariable that is embedded in those operators.

Moreover, the first state variable is the level of belonging to theoriginal population of each sample, and the second state variable is thehaplotype frequency of the original population.

Furthermore, the third state variable is the diplotype and its frequencyfor each sample.

Also, the method of analyzing genome comprises: a process of setting thegene polymorphism to be investigated; a process or setting alleleinformation by the wet process of the gene polymorphism of the group tobe investigated; a process of setting or estimating the haplotype of anindividual using the allele information; a processing of setting twocharacteristic parameters that are in the dual state of the group; aprocess of developing transformation operators between the twocharacteristic parameters from the genetic information; a process ofstarting from a specified initial value and finding the twocharacteristic parameters in order using transformation operators; and aprocess of repeating the transformation until the characteristicparameters converge; and wherein the characteristics of a population areestimated from the sample data by finding the two characteristicparameters.

BRIEF EXPLANATION OF THE DRAWINGS

FIG. 1 is a drawing for explaining the genome-analysis apparatus that isused in the method of analyzing genome of this invention.

FIG. 2 is a drawing for explaining the analysis performed by thegenome-analysis apparatus shown in FIG. 1.

FIG. 3 is a flowchart showing the method of analyzing genome of thisinvention.

FIG. 4 is a drawing showing an example of the haplotype frequency of twooriginal populations.

FIG. 5 is a drawing showing q evaluation.

FIG. 6 is a drawing showing the original population mixture ratio of anindividual for k=2.

FIG. 7 is a drawing showing the original population mixture ratio of anindividual for k =3.

FIG. 8 is a drawing showing P_(k) evaluation for k=2.

FIG. 9 is a drawing showing P_(k) evaluation for k=3.

FIG. 10 is a drawing showing comprehensive data as details of theoriginal populations 1 and 2 for k=2.

FIG. 11 is a drawing showing comprehensive data as details of theoriginal populations 1 to 3 for k=3.

FIG. 12 is a drawing showing comprehensive data as details of theoriginal populations 1 to 4 for k=4.

FIG. 13 is a drawing showing comprehensive data as details of P_(k)evaluation for k=2.

FIG. 14 is a drawing showing comprehensive data as details of P_(k)evaluation for k=2.

FIG. 15 is a drawing showing comprehensive data as details of P_(k)evaluation for k=3.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The preferred embodiments of the invention will be explained below.

FIG. 1 is a drawing for explaining the genome-analysis apparatus that isused in the method of analyzing genome of this invention; FIG. 2 is adrawing for explaining the analysis performed by the genome-analysisapparatus shown in FIG. 1; and FIG. 3 is a flowchart showing the methodof analyzing genome of this invention.

As shown in FIG. 1, the genome-analysis apparatus 1 estimates thecharacteristics of the population from sample data, and outputs theanalysis results. It is possible to use a notebook PC, desktop PC or thelike having an analysis program that performs the calculation for thegenome analysis (described later) as the genome-analysis apparatus 1.

As shown in FIG. 2, the analysis by the genome-analysis apparatus 1models real object in which characterization is possible having a stateof duality, where the first state is A and second state is B, and byembedding genetic (statistical) knowledge in a transformation operator φand transformation operator ψ, dual calculation is performed for state Aand state B, and by converging to a real (population) value (state), thecharacteristics of the population are estimated.

Here, state A is the level of belonging to the original population foreach sample, and state B is the haplotype frequency of the originalpopulation. Also, transformation of state A and state B is mutuallyperformed using transformation equations as operators being expressed byeach other, and this will be described in detail later.

Also, in the case where two variables, a first variable and secondvariable, which express the characteristics of the population to whichthe sample data belongs, are not completely independent and are also notcompletely dependent, the genome-analysis apparatus 1 has a function ofestimating the two variables from a third variable (incomplete data)that is able to observe these two variables. For example, as shown inFIG. 2, this focuses on being able to consider that state A and state Bhave a kind of duality.

Therefore, the population to which the sample data belongs is consideredto be a system expressible in Hilbert space. Also, for example, the twovariables, first and second variable, are taken to be q_(i) and p_(k),(where i is the sample number, and k is the original population number).Here, q_(i) and p_(k) can be considered as two states that characterizethe target system but are not completely independent (entanglementstates), or in other words, are a kind of so-called duality. Byconsidering qi and pk in this way, they can be considered to betransformation operators that perform mutual transformation of eachother such that it is possible to perform Fourier transformation(reverse Fourier transformation) for the particulate aspect and waveaspect of light.

Also, those transformation operators are obtained to be derived from thethird variable that is capable of observation, for example, thediplotype of each sample and its frequency di (where i is the samplenumber), and genetic (statistical) knowledge is embedded in thosetransformation operators. Here, if qi and pk actually have duality, thenby giving appropriate initial values to qi and pk, and performingtransformation using the operators, they converge to the characteristicsof the original population.

As specific examples, it is considered to estimate the originalpopulations from only sample data for the case in which the sample groupcomprises several original populations.

Here,

the level of belonging to the original population of a sample i is takento be q_(i),

the original population is taken to be k_(k),

the haplotype frequency of the original population k is taken to bep_(k), and

the diplotype frequency of sample i is taken to be d_(i).

Also, q_(i), p_(k) and d_(i) can be expressed as follows:

q _(i)=Σ_(j) c _(il) |K _(k)>: where, Σ_(j) c _(ik)=1

d _(i)=Σ_(ll′) a _(ill′) |h _(il) >|h _(il′)>: where Σ_(ll′) a _(ill′)=1

p _(k)=Σ_(m) b _(km) |h _(km)>: where π_(m) b _(km)=1

Here, |K_(k)> (original population vector), and |h_(km)>, |h_(il)>, and|h_(il′)> (haplotype vectors) can each be considered to be one basevector of the vector space to which the sample group belongs.

Here, p and q transform each other with a projection operator, and canbe expressed as shown below.

q=ψp: where ψ is a projection operator.

p=φq: where φ is a projection operator

Here the actual operators can be considered as shown below.

ψ_(ik)=Σ_(ll′) a _(ill′) *b _(kl) *b _(kl′)

φk=Σ_(i)c_(ik)Σ_(ll′)a_(ill′)

In other words, the operators φ and ψ can be considered as a system thatcan express the population to which the sample belongs in Hilbert space,and q_(i) and p_(k) characterize the target system, and can beconsidered to express two states that are not completely independent(entanglement states), and can be handled as a kind of so-calledduality.

From these considerations, it is possible to consider q_(i) and p_(k) asbeing operators that transform each other, and these operators can bederived from d_(i), and if it is possible to find q_(i) and p_(k) inorder, it is possible to converge on the original value (state) of thepopulation.

Also, the operators φ and p_(k) are added for items for which | h_(i)>of each sample, and | h_(k)> of each group match with the probabilityc_(k) for each k (original population), and this can be considered to bethe same as standardization. Moreover, the operators ψ and q_(i) areadded for each k according to b_(k) of matching | h_(i)> and | h_(k)> atthe ratio a_(i) of simultaneous probability of | h_(ij1)> and |h_(ij2)>, and this can be considered to be the same as standardization.Therefore, by starting from an appropriate initial state, q and p arefound using the procedure described above, and converge. Determiningwhether or not they converge can be performed by determining whether ornot p and q converge to a certain value.

Then, the method of analyzing genome by the genome-analysis apparatus 1will be explained.

At first, as shown in FIG. 3, it is determined the gene polymorphism tobe investigated (step S1). Here, firstly, allele information for thegene polymorphism of the group to be investigated is determined by usinga wet process (step S2). Also, the haplotype of an individual isdetermined or estimated by the allele information (step S3).

Next, the two characteristic parameters in the dual state of thepopulation are determined (step S4). Here, these two characteristicparameters are the level of belonging to the original population of thesample and the haplotype frequency of each original population. Also,transformation operators are developed between the two characteristicparameters by using the genetic information (step S5). The geneticinformation here is the diplotype of an individual and its frequency.

Then, starting from adequate initial values, the two characteristicparameters are determined in turn by using the transformation operators(step S6). Then, transformation is repeated until the parametersconverge (step S7). After that, the two characteristic parameters aredetermined (step S8).

EMBODIMENT

Next, an embodiment will be explained.

The figures from FIG. 4 to FIG. 15 show an example of the analysisresults of the method of analyzing genome by using transformationoperators that have duality and that use genotype data and haplotypedata for a plurality of locus in order to deduce the original populationand assign each sample to the original population.

In gene analysis, case-control-correlation analysis is a powerful methodfor mapping the genotype data on the phenotype data (for example,correlation mapping for finding disease genes). However, when estimatingthe original population, there is a possibility that an error will occurin mapping the genotype data from the structured group and that it willresult in a positive result by using case-control-correlation analysis.

Therefore, before performing case-control-correlation analysis, it ispreferred that the potential group structure be detected. To detect thepotential group structure, there is the MCMC method based on Bayesianstatistics, or a method that uses a position allele, such as a classmodel that is based on the concept of distance between samples, toidentify the structured group, however, in this embodiment, a newmodeling method will be employed that uses a dual transformationoperator algorithm.

In this case, the haplotypes are considered to be more informative geneinformation than allele, so haplotypes are used instead of allele. Also,vectors in Hilbert space and their operators are used in thecase-control-correlation analysis of the gene analysis of the groupstructure. In other words, this is because it was presumed that there ishidden real existence belonging to the sampled individual.

Here, the vectors in Hilbert space express the genetic state. Also, theoperators can be transformed from one vector expression to anothervector expression.

Therefore, when the two variables that express the characteristics ofthe population to which the sample data belongs are not completelyindependent, and are also not completely dependent, a method ofestimating the two variables from a third variable (incomplete data)that is capable of observing the two variables is employed.

In this embodiment, as described above, p_(k), the haplotype frequencyof the original population, and q_(i), the level of belonging to theoriginal population of the sample, are used as two characterizingoperators that exist in a dual state. By doing this, it is consideredthat the hidden real object belonging to the sampled individual isestimated. Moreover, in this embodiment, as described above, thediplotype of the individual and its frequency d_(i) are used as observeddata.

Here, as described above, q_(i) and p_(k) are two states (entanglementstates) that characterize the target system and that are not completelyindependent, and are considered to be a kind of so-called duality. Bymaking this consideration, as described above, q_(i) and p_(k) can beconsidered to be transformation operators that perform mutualtransformation of each other such that it is possible to perform Fouriertransformation (reverse Fourier transformation) for the particulateaspect and wave aspect of light.

qi=φ(p _(k))   (1)

pk=ψ(q _(i))   (2)

Therefore, Equation (1) and Equation (2) are assumed for q and p., andthese operators are estimated from genetic statistical knowledge.

Also, by taking the diplotype of the individual and its frequency to bed_(i), it can be expressed as in the following equations (3) to (5) inHilbert space expression.

qi=Σ _(j) c _(ik) |K _(k)> where Σ_(k) c _(ik)=1   (3)

h _(i)=Σ_(ll′) a _(ill′) |h _(il) >|h _(il′)> where Σ_(ll′) a _(ill′)=1  (4)

p _(k)=Σ_(m) b _(km) |h _(km)> where Σ_(m) b _(km)=1   (5)

Further, |K_(k)> (original population vector), and |h_(km)>, |h_(il)>,and |h_(il′)> (haplotype vectors) can each be considered to be one basevector of the vector space to which the sample group belongs.

Also, the following equations (6) and (7) areusedas actual dualitytransformation operators.

ψ_(ik)=Σ_(ll′) a _(ill′) *b _(kl) *b _(kl′)  (6)

φ_(k)Σ_(i)c_(ik)Σ_(ll′)a_(ill′)  (7)

Then, from these equations, firstly, in step 1), an appropriate initialvalue is set for q_(i) from d_(i). However, the initial value is exceptfor 1/k. Also, k is the number of the original population. Then, in step2), p_(k) is determined from equation (7). Then, in step 3), q_(i) isdetermined from equation (6). Here, calculation is repeated until p_(k)and q_(i) converge.

Next, the haplotype frequency data for each original population of astructured population will be explained.

FIG. 4 shows an example of the haplotype frequency for, as for example,two groups of the group (original population) Tn this example, thehaplotype is expressed from six loci. Also, it can be seen that eachlocus has two allelic genes (SNP). Here, “1” indicates a large number ofallelic genes, and “2” indicates a small number of allelic genes. Thedetailed group (original population) information and haplotype frequencyevaluated here can be checked from the comprehensive data shown in FIG.10.

FIG. 5 shows the q_(i) evaluation, and the details of this can bechecked from the comprehensive data shown in FIG. 10. Here, it is shownthat the number of original populations comprised for the sampledpopulation and a comparison of the evaluation between the method of thisinvention and other method. Here, it is difficult to distinguish thesedifferences if the haplotype frequencies of the original population aremore similar; however, as the number of haplotype blocks increases, thebetter results can be obtained.

For example, I123 is a data for a combination of the three haplotypeblocks I1, I2, and I3. Moreover, I123456 is data for a combination ofI1, I2, I3, I4, I5, and I6. The result of these plurality of haplotypeblocks give a better agreement than for a single block unit.

FIG. 6 shows original population mixture ratios for a sample when k(number of original populations)=2, and FIG. 7 shows original populationmixture ratios for a sample when k (number of original populations)=3.In other words, when the original population mixture ratio for a sampleis “1”, the sample belongs to one group, however, when the mixture ratiois between 0 and 1, the sample belongs to a plurality of originalpopulations at that mixture ratio.

FIG. 8 shows the p_(k) evaluation when k=2, and FIG. 9 shows the p_(k)evaluation when k=3. It can be seen that with evaluation using dualitytransformation, it is shown that the results are identical or betterthan the results by using the MCMC method. The p_(k) evaluation can bechecked from the comprehensive data shown in FIG. 13 to FIG. 15.

Here, FIG. 10 is a drawing showing comprehensive data, which givesdetails of the original population groups 1 and 2 when k=2, FIG. 11 is adrawing showing comprehensive data, which gives details of the originalpopulation groups 1 to 3 when k=3, and FIG. 12 is a drawing showingcomprehensive data, which gives details of the original populationgroups 1 to 4 when k=4.

Also, FIG. 13 and FIG. 14 show comprehensive data, which gives detailsof p_(k) evaluation when k=2, and FIG. 15 shows comprehensive data,which gives details of p_(k) evaluation when k=3.

In this way, in this embodiment, by obtaining sample data, embeddinggenetic (statistical) knowledge in a first and second state variablethat have duality, and having the first and second state variablesconverge to what the original values should be, it is possible toestimate the characteristics of the population of the sample data, andoutput the estimated results of the characteristics of the population,so it is possible to perform analysis for estimating characteristics ofa population from sample data.

INDUSTRIAL APPLICABILITY

As described above, with this invention it is possible to performanalysis for estimating characteristics of a population from sampledata.

1. A method of analyzing genome which wherein performs analysis forestimating the characteristics of a population from sample data, themethod comprising the steps of: obtaining said sample data; estimatingthe characteristics of a population to which said sample data belongs toby selecting a first and second state variable having duality accordingto genetic (statistical) knowledge, and making said first and secondstate variables converge to the value what the original value should be;and outputting the results of said estimated characteristics of saidpopulation.
 2. The method of analyzing genome of claim 1, wherein themethod further comprising the steps of: mutually performingtransformation by transformation equations as operators that areembedded with genetic (statistical) knowledge in which said first andsecond state variables are expressed by each other, and estimating saidfirst and second state variables by a third state variable that isembedded in those operators.
 3. The method of analyzing genome of claim1, wherein said first state variable is the level of belonging to theoriginal population of each sample, and said second state variable isthe haplotype frequency of the original population.
 4. The method ofanalyzing genome of claim, said third state variable is the diplotypeand its frequency for each sample.
 5. The method of analyzing genome ofclaim 1, the method further comprising the steps of: determining thegene polymorphism to be investigated; determining allele information bya wet process of the gene polymorphism of the group to be investigated;determining or estimating the haplotype of an individual using saidallele information; determining two characteristic parameters that arein the dual state of the group; developing transformation operatorsbetween said two characteristic parameters from the genetic information;starting from a specified initial value and finding said twocharacteristic parameters in order by using the transformationoperators; and repeating the transformation until said characteristicparameters converge; and wherein the characteristics of a population areestimated from said sample data by determining said two characteristicparameters.
 6. A device for analyzing genome, which performs analysisfor estimating the characteristics of a population from sample data, thedevice comprising: an input unit to input said sample data; a computingunit programmed to estimate the characteristics of a population to whichsaid sample data belongs by selecting a first and second state variablehaving duality according to genetic (statistical) knowledge, and makingsaid first and second state variables converge to the value what theoriginal value should be; and an output unit to output the results ofsaid estimated characteristics of said population.
 7. The device foranalyzing genome of claim 6, wherein the estimating estimates bymutually performing transformation by transformation equations asoperators that are embedded with genetic (statistical) knowledge inwhich said first and second state variables are expressed by each other,and estimating said first and second state variables by a third statevariable that is embedded in those operators.
 8. The device foranalyzing genome of claim 6, wherein said first state variable is thelevel of belonging to the original population of each sample, and saidsecond state variable is the haplotype frequency of the originalpopulation.
 9. The device for analyzing genome of claim 6, wherein saidthird state variable is the diplotype and its frequency for each sample.10. The device for analyzing genome of claim 6, wherein the device isadapted to carry out a computing method comprising the steps of:determining the gene polymorphism to be investigated; determining alleleinformation by a wet process of the gene polymorphism of the group to beinvestigated; determining or estimating the haplotype of an individualusing said allele information; determining two characteristic parametersthat are in the dual state of the group; developing transformationoperators between said two characteristic parameters from the geneticinformation; starting from a specified initial value and finding saidtwo characteristic parameters in order by using the transformationoperators; and repeating the transformation until said characteristicparameters converge; and wherein the characteristics of a population areestimated from said sample data by determining said two characteristicparameters.
 11. The method of analyzing genome of claim 2, wherein saidfirst state variable is the level of belonging to the originalpopulation of each sample, and said second state variable is thehaplotype frequency of the original population.
 12. The method ofanalyzing genome of claim 2, the method further comprising the steps of:determining the gene polymorphism to be investigated; determining alleleinformation by a wet process of the gene polymorphism of the group to beinvestigated; determining or estimating the haplotype of an individualusing said allele information; determining two characteristic parametersthat are in the dual state of the group; developing transformationoperators between said tvo characteristic parameters from the geneticinformation; starting from a specified initial value and finding saidtwo characteristic parameters in order by using the transformationoperators; and repeating the transformation until said characteristicparameters converge; and wherein the characteristics of a population areestimated from said sample data by determining said two characteristicparameters.
 13. The method of analyzing genome of claim 3, the methodfurther comprising the steps of: determining the gene polymorphism to beinvestigated; determining allele information by a wet process of thegene polymorphism of the group to be investigated; determining orestimating the haplotype of an individual using said allele information;determining two characteristic parameters that are in the dual state ofthe group; developing transformation operators between said twocharacteristic parameters from the genetic information; starting from aspecified initial value and finding said two characteristic parametersin order by using the transformation operators; and repeating thetransformation until said characteristic parameters converge; andwherein the characteristics of a population are estimated from saidsample data by determining said two characteristic parameters.
 14. Themethod of analyzing genome of claim 4, the method further comprising thesteps of: determining the gene polymorphism to be investigated;determining allele information by a wet process of the gene polymorphismof the group to be investigated; determining or estimating the haplotypeof an individual using said allele information; determining twocharacteristic parameters that are in the dual state of the group;developing transformation operators between said two characteristicparameters from the genetic information; starting from a specifiedinitial value and finding said two characteristic parameters in order byusing the transformation operators; and repeating the transformationuntil said characteristic parameters converge; and wherein thecharacteristics of a population are estimated from said sample data bydetermining